135 research outputs found
Multi-objective scheduling of Scientific Workflows in multisite clouds
Clouds appear as appropriate infrastructures for executing Scientific Workflows (SWfs). A cloud is typically made of several sites (or data centers), each with its own resources and data. Thus, it becomes important to be able to execute some SWfs at more than one cloud site because of the geographical distribution of data or available resources among different cloud sites. Therefore, a major problem is how to execute a SWf in a multisite cloud, while reducing execution time and monetary costs. In this paper, we propose a general solution based on multi-objective scheduling in order to execute SWfs in a multisite cloud. The solution consists of a multi-objective cost model including execution time and monetary costs, a Single Site Virtual Machine (VM) Provisioning approach (SSVP) and ActGreedy, a multisite scheduling approach. We present an experimental evaluation, based on the execution of the SciEvol SWf in Microsoft Azure cloud. The results reveal that our scheduling approach significantly outperforms two adapted baseline algorithms (which we propose by adapting two existing algorithms) and the scheduling time is reasonable compared with genetic and brute-force algorithms. The results also show that our cost model is accurate and that SSVP can generate better VM provisioning plans compared with an existing approach.Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft
(ZcloudFlow project) and performed in the context of the Computational Biology Institute (www.ibc-montpellier.fr). We would like to thank Kary Ocaña for her help in modeling and
executing the SciEvol SWf.Peer ReviewedPostprint (author's final draft
Sharing scientific experiments and workflows in environmental applications
Environmental applications have been stimulating the cooperation among scientists from different disciplines. There are many examples where this cooperation takes place through exchanging scientific resources, such as data, programs and mathematical models. The LeSelect architecture supports environmental applications, where scientists may share their data and programs. We believe that besides programs and data, models, as well as experiments and workflows are scientific resources that need to be shared in environmental applications. Therefore, in this paper we propose an extension to LeSelect architecture that allows sharing of models, experiments and workflows
Grid Data Management: Open Problems and New Issues
International audienceInitially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques
ProvLight: Efficient Workflow Provenance Capture on the Edge-to-Cloud Continuum
Modern scientific workflows require hybrid infrastructures combining numerous
decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems
(aka the Computing Continuum) to enable their optimized execution.
Understanding and optimizing the performance of such complex Edge-to-Cloud
workflows is challenging. Capturing the provenance of key performance
indicators, with their related data and processes, may assist in understanding
and optimizing workflow executions. However, the capture overhead can be
prohibitive, particularly in resource-constrained devices, such as the ones on
the IoT/Edge.To address this challenge, based on a performance analysis of
existing systems, we propose ProvLight, a tool to enable efficient provenance
capture on the IoT/Edge. We leverage simplified data models, data compression
and grouping, and lightweight transmission protocols to reduce overheads. We
further integrate ProvLight into the E2Clab framework to enable workflow
provenance capture across the Edge-to-Cloud Continuum. This integration makes
E2Clab a promising platform for the performance optimization of applications
through reproducible experiments.We validate ProvLight at a large scale with
synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB
testbed. Evaluations show that ProvLight outperforms state-of-the-art systems
like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26
-- 37x faster to capture and transmit provenance data; uses 5 -- 7x less CPU;
2x less memory; transmits 2x less data; and consumes 2 -- 2.5x less energy.
ProvLight and E2Clab are available as open-source tools
Mediators Metadata Management Services: An Implementation Using GOA++ System
The main contribution of this work is the development of a Metadata Manager to interconnect heterogeneous and autonomous information sources in a flexible, expandable and transparent way. The interoperability at the semantic level is reached using an integration layer, structured in a hierarchical way, based on the concept of Mediators. Services of a Mediator Metadata Manager (MMM) are specified and implemented using functions based on the Outlines of GOA++. The MMM services e are available in the form of a GOA++ API and they can be accessed remotely via CORBA or through local API calls.Sociedad Argentina de Informática e Investigación Operativ
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in
biological data, demanding the development of large-scale bioinformatics
experiments. Because these experiments are computation- and data-intensive,
they require high-performance computing (HPC) techniques and can benefit from
specialized technologies such as Scientific Workflow Management Systems (SWfMS)
and databases. In this work, we present BioWorkbench, a framework for managing
and analyzing bioinformatics experiments. This framework automatically collects
provenance data, including both performance data from workflow execution and
data from the scientific domain of the workflow application. Provenance data
can be analyzed through a web application that abstracts a set of queries to
the provenance database, simplifying access to provenance information. We
evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree
assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a
RASopathy analysis workflow. We analyze each workflow from both computational
and scientific domain perspectives, by using queries to a provenance and
annotation database. Some of these queries are available as a pre-built feature
of the BioWorkbench web application. Through the provenance data, we show that
the framework is scalable and achieves high-performance, reducing up to 98% of
the case studies execution time. We also show how the application of machine
learning techniques can enrich the analysis process
SARAVÁ: data sharing for online communities in P2P
International audienceThis paper describes SARAVÁ, a research project that aims at investigating new challenges in P2P data sharing for online communities. The major advantage of P2P is a completely decentralized approach to data sharing which does not require centralized administration. Users may be in high numbers and interested in different kinds of collaboration and sharing their knowledge, ideas, experiences, etc. Data sources can be in high numbers, fairly autonomous, i.e. locally owned and controlled, and highly heterogeneous with different semantics and structures. Our project deals with new, decentralized data management techniques that scale up while addressing the autonomy, dynamic behavior and heterogeneity of both users and data sources. In this context, we focus on two major problems: query processing with uncertain data and management of scientific workflows
Enhancing Energy Production with Exascale HPC Methods
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose
processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale
simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of
Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and
from the Brazilian Ministry of Science, Technology and Innovation through Rede
Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the
Intel Corporation, which enabled us to obtain the presented experimental results in
uncertainty quantification in seismic imagingPostprint (author's final draft
- …